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Abstract 

A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior prob¬ 
ability density. The starting point is a series expansion of the likelihood function in terms of orthogonal 
polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated 
semi-analytically. The posterior is formally represented as the product of a reference density and a linear 
combination of polynomial basis functions. Both the model evidence and the posterior moments are related 
to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one 
to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed 
and demonstrated on the basis of simple applications from classical statistics and inverse modeling. 

Keywords: Uncertainty quantification, Bayesian inference, Inverse problem, Surrogate modeling, Polynomial 
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1 Introduction 

In view of inverse modeling mm and uncertainty quantification GO®. Bayesian inference establishes a convenient 
framework for the data analysis of engineering systems m- It adopts probability theory in order to represent, 
propagate and update epistemic parameter uncertainties. The prior distribution captures the uncertainty of 
unknown model parameters before the data are analyzed. A posterior is then constructed as an updated 
distribution that captures the remaining uncertainty after the data have been processed. The computation of 
this posterior is the primary task in Bayesian inference. 

Some simplified statistical models admit closed-form expressions of the posterior density. Beyond these so- 
called conjugate cases, computational approaches either aim at evaluating expectation values under the posterior 
or drawing samples from it [8]. This is usually accomplished with stochastic methods such as Markov chain 
Monte Carlo mm- Nowadays this class of techniques constitutes the mainstay of Bayesian computations. The 
posterior is explored by realizing an appropriate Markov chain over the prior support that exhibits the posterior 
as its long-run distribution. In turn, the obtained sample is used to empirically approximate the statistical 
quantities of interest. These include the characteristics of the posterior and the predictive distribution. This 
way of proceeding suffers from some inherent deficiencies. The presence of sample autocorrelation and the 
absence of a convergence criterion cause severe practical problems. Moreover, Markov chain Monte Carlo 
typically requires a large number of serial forward model runs. Since in engineering applications even a single 
model run can be computationally taxing, this may be prohibitive. In the recent past, numerous enhancements 
have been proposed in order to accelerate Markov chain Monte Carlo for Bayesian inverse problems. This 
includes the implementation of more efficient sampling algorithms, e.g. transitional Markov chain Monte Carlo 
mm or Hamiltonian Monte Carlo mm, and the substitution of the forward model with an inexpensive 
metamodel, e.g. based on Gaussian process models EMU or polynomial chaos expansions |18fEU] . Although 
these approaches promise significant speedups, they still inherit all principle shortcomings of sample-based 
posterior representations. 
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Unfortunately there are only few fundamental alternatives to stochastic sampling. Variational Bayesian 
inference establishes such an alternative where the posterior is sought through deterministic optimization ED- 
123]. In particular, a member from a simple parametric family of probability densities is selected such that 
some distance to the posterior is minimized. In this regard, the Kullback-Leibler divergence is often chosen 
as a relative measure of the dissimilarity of two probability densities. The procedure commonly rests upon 
some simplifying independence assumptions. Variational methods are regarded as less computing intensive 
than Markov chain Monte Carlo, yet they are only approximate. They are prominently used in machine 
learning and computer science [ 2H125] , and since recently such methods are applied to inverse problems ESI HZ], 
too. A particularly interesting implementation of variational Bayesian inference has been proposed in |J8J. 
The posterior is parametrized as a transformation of the prior density and can be computed based on the 
corresponding back-transformation. More specifically, a random variable transformation is sought in polynomial 
form such that the Kullback-Leibler divergence between the prior density and the back-transformed posterior 
density is minimized. This formulation is supported by arguments from optimal transport theory which also 
allows for a practical regularization of the problem. Finally, samples from the posterior distribution are obtained 
by independently sampling from the prior and applying the polynomial map. Another approach to certain 
Bayesian inverse problems has been recently devised in (29, 30]. Based on monomial Taylor expansions of the 
forward model and of the posterior density, the computation of expectation values under the posterior is tackled 
by sparse numerical quadrature. 

In this paper we propose a novel approach to surrogate the posterior probability density in itself. The main 
idea is to decompose the likelihood function into a series of polynomials that are orthogonal with respect to 
the prior density. It is shown that all statistical quantities of interest can then be easily extracted from this 
spectral likelihood expansion. Emulators of the joint posterior density and its marginals are derived as the 
product of the prior, that functions as the reference density, and a linear combination of polynomials, that 
acts as an adjustment. In doing so, the model evidence simply emerges as the coefficient of the constant term 
in the expansion. Moreover, closed-form expressions for the first posterior moments in terms of the low-order 
expansion coefficients are given. The propagation of the posterior uncertainty through physical models can 
be easily accomplished based on a further postprocessing of the expansion coefficients. In this sense, spectral 
Bayesian inference is semi-analytic. While the corrections required for an expansion of the posterior with respect 
to the prior as the reference density may be large, they can be small for an expansion around a properly chosen 
auxiliary density. A change of the reference density is therefore suggested in order to increase the efficiency of 
computing a posterior surrogate. The devised formulation entirely avoids Markov chain Monte Carlo. Instead 
it draws on the machinery of spectral methods |3lh33] and approximation theory [34H36] , It is proposed to 
compute the expansion coefficients via linear least squares E3I2H]. This allows one to make use of the wealth 
of statistical learning methods EH ED] that are designed for this type of problems. The approach features a 
natural convergence criterion and it is amenable to parallel computing. 

The scope of applicability of the proposed approach covers problems from Bayesian inference for which the 
likelihood function can be evaluated and for which polynomials can be constructed that are orthogonal with 
respect to the prior, possibly after a carefully chosen variable transformation. This excludes statistical models 
that involve intractable likelihoods ED Ell, i.e. the likelihood cannot be exactly evaluated. It also excludes 
improper prior distributions E21E3, be. the prior does not integrate to one or any finite value, and models 
with pathologic priors such as the Cauchy distribution for which the moments are not defined EHUllI- Many 
hierarchical Bayesian models (47] I48| are not covered by the devised problem formulation. They are either based 
on conditional priors, which does not allow for orthogonal polynomials, or on integrated likelihoods, which can 
only be evaluated subject to noise. 

Spectral likelihood expansions complement the existing array of Bayesian methods with a way of surro¬ 
gating the posterior density directly. They have the potential to remedy at least some of the shortcomings 
of Markov chain Monte Carlo. Yet, their practical implementation poses challenges. Hence, the goal of this 
paper is to discuss and investigate the possibilities and limitations of the approach. The method of spectral 
likelihood expansions is therefore applied to well-known calibration problems from classical statistics and inverse 
heat conduction. We restrict the analysis to low-dimensional problems. The final results are compared with 
corresponding results from Markov chain Monte Carlo simulations. 

The manuscript is structured as follows. The principles of Bayesian inference are summarized in ??. Sur¬ 
rogate forward modeling with polynomial chaos expansions is reviewed in ??. After that, spectral likelihood 
expansions are introduced as an alternative approach to Bayesian inference in ??. Two well-known Gaus¬ 
sian fitting examples and the identification of thermal properties of a composite material serve as numerical 
demonstrations in ??. Finally, it is summarized and concluded in ??. 
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2 Bayesian inference 


Let x = (# 1 ,... ,Xm) T G B> x with V x = V Xl x ... x T> Xm C R m be a vector of unknown parameters. The 
goal of statistical inference is to deduce these unknowns from the observed data y = (yi,.. •, Pn) T G R w . 
In Bayesian statistics one adapts probabilistic models for representing uncertainty. Hence, let (fi, J 7 , V) be a 
suitable probability space with a sample space a u-field T and a probability measure V. On this space one 
can define a prior model of the unknowns and an observational model of the data that represent the encountered 
parameter uncertainties and the experimental situation, respectively. 

The epistemic uncertainty of the parameter values is cast as a X^-valued random vector X: —> T> x C 

R m . Here, the components of X = (Xi,... ,Xm) t are V Xi -valued random variables Xt: fl —> T> Xi C R for 
i = 1,..., M. Since the data have not been processed at this stage, the joint density of X ~ tt(x ) is called the 
prior density. Similarly, a R^-valued random vector Y: O —> R w represents the observables. The components 
of Y = (Y\ ...., Y/v) T are real-valued random variables Yi: fl —> R for * = 1,..., N. In order to draw inferences 
from the data about the unknowns, one has to formulate an observational model that establishes a relationship 
between those quantities. Commonly this is a probabilistic representation of the observables Y\x ~ f(y\x) 
that is conditional on the unknown parameters. For the actually acquired data Y = y, the likelihood function 
C(x) = f{y\x ) is defined by interpreting the conditional density f(y\x) as a function of the unknowns x. 

Given this setup, one can formulate an updated probability density tt{x\ y) of the unknowns that is condi¬ 
tioned on the realized data. This so-called posterior density results from Bayes’ law 

tt(*I y) = —^—• (!) 

It completely summarizes the available information about the unknowns after the data have been analyzed. 
The model evidence Z properly normalizes the posterior density. It can be written as 

Z = y C(x)Tt(x)dx. (2) 

One is often interested in the marginals and moments of the posterior. The posterior marginal 7r(a ',j \ y) of a 
single unknown Xj with j G {1 ,... ,Mj is defined as 


7 r ( x j\y) = 


n(x\y) d x~j. 


T>, 




(3) 


Here, the simplifying notation x^j = (xi,...,Xj_i,ajj+i,... ,Xm) T is introduced. For the mean E[X|y] and 
the covariance matrix Cov[X|y] of the posterior one has 

E[X| y\ = J xn(x\y)dx, (4) 

v x 

Cov[X|y] = J (x - E[X\y])(x - Fj[X\y]) T ir(x\y) dx. (5) 


More generally, Bayesian inference focuses the computation of posterior expectation values of the quantities 
of interest (Qol) h: V x —» R. These expectations may be formally expressed as 


HH x )\y] = J h(x)n(x\y)dx. (6) 

Da, 

For later considerations, it is remarked that this integration over the posterior density can be interpreted as a 
reweighted integration over the prior density 

nh(X)\y] = J h(x)^-n(x)dx = [h{X)C(X)\. (7) 

T> x 
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2.1 Bayesian inverse problems 

The Bayesian framework described above can be applied to a vast range of scenarios from classical statistics |491 
and inverse modeling EH Eg. In inverse problems, a so-called forward model Ai establishes a mathematical 
representation of the physical system under consideration. It is the function 


At:V x ->R N 
x i-> A4{x) 


( 8 ) 


that maps model inputs x G V x C R M to outputs y = Ai(x) G IR/ V . Inversion is the process of inferring the 
unknown forward model parameters x with the measured data y of its response. 

A probabilistic model of the observables is commonly constructed supposing that they can be represented 
as the sum Y = Y + E of the model response vector Y = Xl(X): H —> ET V and another random vector 
E: Q —> R :V . The latter accounts for measurement noise and forward model inadequacy. It is assumed that the 
residual vector E is statistically independent from X. An unknown realization E = e measures the discrepancy 
between the actually measured data y = y + e and the model response y = At(x) at the true value x. 
Typically one starts from the premise that the residual E ~ A/"(e|0, S) follows a Gaussian distribution. Here, 
E is a symmetric and positive-definite covariance matrix. The observational model is then simply given as 
Y\x ~ N{y\M(x), S). For the likelihood this implies 

C{X) = v^dettE) “ P (4 to “ M{X))T 1 iV - M(X)) ) ■ (9) 

For the actually acquired data y, this is understood as a function of the unknowns x. The Bayesian solution to 
the inverse problem posed is then the posterior in ?? where the likelihood is given as in ??. It summarizes the 
collected information about the unknown forward model inputs. 


2.2 Markov chain Monte Carlo 

Apart from some exceptional cases, the posterior density in ?? does not exhibit a closed-form expression. Thus 
one settles either for computing expectation values under the posterior or for sampling from the posterior. 
The former can be accomplished through stochastic integration techniques such as Monte Carlo (MC) |53| or 
importance sampling [51]. For the latter one usually has to resort to Markov chain Monte Carlo (MCMC) 
sampling mm ■ An ergodic Markov chain Xl 1 ), X ^,... over the support V x is constructed in such a way 
that the posterior arises as the invariant distribution 

■K(x^ t+1 ' ) \y) = J tt(x^\ y) K.(x^ t \x f ' t+1 ' > ) dx^. (10) 

T)*. 

Here, K,(x^\x^ t+1 ^) denotes the density of the transition probability from the state x^ of the Markov chain 
at a time t to its state x^ t+1 ^ at time t + 1. The Metropolis-Hastings (MH) algorithm [551 155] suggests an 
easy principle for the construction of a Markov kernel K that satisfies ??. It is based on sampling candidates 
from a proposal distribution and a subsequent accept/reject decision. The transition kernel defined this ways 
satisfies detailed balance, i.e. time reversibility t:(x^\ y) K,(x( t \x ( ' t+1 ' > ) = 7r(a;^ +1 *| y) K.(x( t+1 \ x^). This is a 
sufficient condition for ?? to apply. In practice, one initializes the Markov chain at some G T> x and then 
iteratively applies the MH updates from x^l to x^ t+1 ' > for a finite number of times T. The ergodic theorem then 
ensures that one can approximate the population average in ?? in an asymptotically consistent way as the time 
average 

e[ (ii) 

t=i 

A whole string a unpleasant consequences is entailed by the fact that MCMC updates are typically local 
and serially dependent. The quality of the posterior approximation is governed by the MCMC sample autocor¬ 
relation. In order to ensure an efficient posterior exploration one has to carefully design and tune the proposal 
distribution. This is an extremely tedious and problem-dependent task. Yet, even for comparably efficient 
MCMC updates, a large number of MCMC iterations may be required in order to achieve an acceptable degree 
of fidelity of the final results. In inverse modeling this requires an even larger number of serial forward solves 
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which can be prohibitively expensive for demanding models. Another intrinsic MCMC weakness is that it lacks 
a clear convergence and stopping criterion, i.e. for diagnosing when the chain has forgotten its initialization 
and has converged to the target distribution in ??, and for the assessment of when the MC error in ?? has 
become sufficiently small. Even though there are more or less sophisticated convergence diagnostics E3EE3, 
those heuristic checks may very well fail, e.g. when separated posterior modes have not yet been detected. 

The model evidence in ?? is important in the context of model comparison and selection |59| . In engineering 
applications it often happens that one wants to judge the performance of various competing models against 
measured data ED nm. While in variational Bayesian inference at least a lower bound of the model evidence 
is implicitly computed as a side product, in the MH algorithm it is not computed at all. Avoiding the explicit 
computation of the model evidence is beneficial for parameter estimation, but it does not allow for model 
selection. To this effect one has to rely on dedicated methods [521 531 . 

3 Surrogate forward modeling 

In the analysis of engineering systems it has become a widespread practice to substitute expensive computer 
models with inexpensive metamodels or surrogate models. Those approximations mimic the functional relation¬ 
ship between the inputs and the outputs of the original model in ??. Metamodeling promises significant gains 
in situations that require a large number of forward model runs, e.g. for optimization problems, uncertainty 
analysis and inverse modeling. Important classes of metamodels are based on Gaussian process models or Krig- 
ing [Mi 55 and polynomial chaos expansions [65;. More recent introductions to these subjects can be found in 
[671 68] and [69] HQ], respectively. Nowadays the application of Kriging [15] HD and polynomial chaos surrogates 
fI5H2D] is commonplace in Bayesian inverse problems. 

We focus on polynomial chaos metamodels next. The idea is to decompose the forward model response into 
polynomial terms that are orthogonal with respect to a weight function. In stochastic analysis this weight is 
often identified with a probability density in order to facilitate uncertainty propagation. In inverse analysis it is 
commonly equated with the prior in order to enhance MCMC posterior sampling. The formalism of polynomial 
chaos expansions is rooted in spectral methods and functional approximations with orthogonal polynomials. 
Hence, the function space point of view is emphasized in this section. We also concentrate on linear least 
squares for the practical computation of the expansions coefficients. 

3.1 L\ function space 

From here on it is assumed that the components of the uncertain parameter vector X = (Xi,... ,Xm) t are 
independent random variables Xi. Thus their joint density can be written as 

M 

n ( x ) = ( 12 ) 

i=1 

Let L 2 (T> X ) = { u: V x —>■ R| f v u 2 (x) n(x) dec < oo} be the Hilbert space of functions that are square integrable 
with respect to the prior density in ??. For u,v £ L^(T> X ) a weighted inner product (•, •)^2 and its associated 
norm ||-||x ,2 are defined as 


(u,v) L 2 = J u(x)v(x) n(x) dx, (13) 

T>a. 

\\u\\li = (u,u)l2 ■ ( 14 ) 

Given that u,v £ L/^(V X ), the real-valued random variables u(X), v(X): fl —> R on the probability space 
(fljXjV) have a finite variance. One can then write the inner product in ?? as the expectation 

(u,v) l z=E[u(X)v(X)]. (15) 

In the further course of the presentation, the identity in ?? is frequently used in order to switch back and forth 
between expectation values under the prior distribution and weighted inner products. 
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3.2 Orthonormal polynomials 

Now a basis of the space L^{V X ) is constructed with orthogonal polynomials |7lH75] . Let {'S'ahaien be a family 
of univariate polynomials in the input variable Xi £ T> Xi . Each member is characterized by its polynomial degree 
a.i £ IN. The polynomials are required to be orthonormal in the sense that 





if OLi = Pi, 
if at ^ Pi. 


(16) 


These polynomials form a complete orthogonal system in L 2 (T> x ). Next, a set of multivariate polynomials 
a£]N M i n the input variables x E T> x is constructed as the tensor product 

M 

*„(*)=(i7) 

i— 1 

Here, cx = (oq,... ,cxm) G IN^ 7 is a multi-index that characterizes the polynomials. By construction, namely 
due to ??????, they are orthonormal in the sense that 


(^,^=^= 1 * ( 18 ) 

10 ilQfp. 

These polynomials establish a complete orthogonal basis in L 2 (2? x ). Note that the constant term is always given 
as TqI = 1 in the univariate case. This ensures the proper normalization ||Tq ^||£2 = 1. In the multivariate 
case one similarly has To = 1 with ||To||l 2 = 1. 

3.3 Hermite and Legendre polynomials 

Two classical univariate families are the Hermite and the Legendre polynomials {H ai (xi)} ai ^ for Xi £ R and 
{P ai (xdlctiGiN for Xi £ [—1,1], respectively. The former are orthogonal with respect to the weight function 
N{xi 10,1) = (27t) - 1 / 2 exp(— xf/2), the latter with respect to Upxp — 1,1) = I[_i,i] (a?i)/2. Here, I[-i,i] denotes 
the indicator function of the interval [—1,1]. A short summary of these two univariate families is given in ??. 
Over the respective domains, their first members are defined as given in the ??. Classical orthogonal polynomials 
{V’ai (£»)}«■ are typically not normalized, e.g. the aforementioned Hermite or Legendre families. An orthonor¬ 
mal family {$a]} ai g|[( is then obtained through an appropriate normalization with = pa^/WPa! II// 2 • 

Table 1: Two families of orthogonal polynomials. 


Input type 

Polynomials 

v Xi 

TTpXi) 


II^IUj 4 

Gaussian 

Hermite 

R 

0 , 1 ) 

H-oii i%i) 

V • 

Uniform 

Legendre 

[-1,1] 

U{xi |-1,1) 

P ai (Xi) 

\/l/ (^ a i + 1) 


In practice, the parameter space T> x and the input distribution tt(x) are often not directly suitable for an 
expansion based on the two standardized families in ??. One possibility is then to employ suitably chosen or 
constructed polynomials ini Eg. Another possibility is to use an invertible function T: R M R M , sufficiently 
well-behaved and as linear as possible, in order to transform the physical variables x into standardized variables 
£ = T(x), i.e. the image V £ = T(V X ) and the transformed weight function |det J 7--1 (£)| 

are of a standard form. Here, J 7--1 = dT -1 /d£ is the Jacobian matrix. If such a change of variables is needed, 
the considerations that are given below for Ad and h in the variables x £ V x can be straightforwardly repeated 
for Ad o T” 1 and h o T _1 in the variables £ £ V^. In this case, the expectation in ?? follows the integration by 
substitution 

nh(X)\y] = ±J /i(7~- 1 (£))£(T- 1 (0) M£)de (19) 
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3.4 Polynomial chaos expansions 

For simplicity, herein the presentation is restricted to scalar-valued models Ai: V x —> R. The extension to 
the multivariate case is straightforward. It is supposed that the forward model Ai € L%(D X ) is mean-square 
integrable. This is a reasonable assumption as seen from a physical perspective. In L/f(T> x ) the generalized 
Fourier expansion of A4 in terms of the orthogonal polynomials {’J'oJaeiN" is then given as 

Ai = ^ a a 'Fa, with (20) 

a a = (At^a)^ — j Ai(x)^ a ( y x)Tr{x)dx. (21) 

Da, 


The generalized Fourier coefficients {a a } a eiN M of the series expansion in ?? are defined as the orthogonal 
projection of Ad onto the basis elements in ??. The corresponding Fourier series of the second-order random 
variable Y = Ai(X) on is a so-called polynomial chaos expansion (PCE) Ai(X) = 8 a'l’a(^)' 

PCEs have been popularized in the context of uncertainty propagation where the goal is the quantification 
of the distribution of Y = Ai(X). For this purpose it comes in handy that the mean and the variance of this 
random variable can be easily determined from its PCE coefficients. Indeed, with ?? it is easy to verify that 
they are simply given as 


E[Ai{X)} 



Var[M(X)] 


y a a d> a 


ao'Fo 


LI 


E -l 

a6l"\{0) 


( 22 ) 

(23) 


The simple identities in ???? follow from the definitions of the inner product and the associated norm in ????, 
respectively. 


3.5 Truncated series 

For a practical computation one has to truncate the infinite series in ??. Let the total degree of a multivariate 
polynomial 'F ct be defined as ||o:||i = ^2iLi\ a i\- A standard truncation scheme is then adopted by limiting the 
terms in ?? to the finite set of multi-indices 


A = ||a||i<p}. (24) 

This specifies a set of polynomials {’F a }ae.4p such that their total degree ||a||i is smaller than or equal to a 
chosen p. The total number of terms retained in the set A p is given as 


f M + p\ _ (M +p)\ 
V P ) M\p\ 


(25) 


The dramatic increase of the total number of terms P with the input dimensionality M and the maximal 
polynomial degree p , that is described by ??, is commonly referred to as the curse of dimensionality. A simple 
idea to limit the number of regressors relies on hyperbolic truncation sets. For 0 < q < 1 a quasinorm is 
defined as ||a|| q = (X^i^il a i l) 1 ^ 9 - The corresponding hyperbolic truncation scheme is then given as Af = {a £ 
IN M | ||o:|| ? < p}. Adopting the standard scheme in ??, a finite version of ?? can be written as 


Adpix^j ^ ^ n a dt(x^x'j . 

a£Ap 


(26) 


In engineering problems, one uses A4 p (x) as a functional approximation of Ai(x), i.e. as a polynomial 
response surface m- This is justified because the approximation converges in the mean-square sense 


M - M, 


Ll 


= E 


(M(X)-M P (X 


= a c 

«6I m \4 p 


—> 0, for p -A oo. 


(27) 
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The rate of the convergence in ?? depends on the regularity of M. On top of that, the response surface in ?? 
is also optimal in the mean-square sense 


M-M p 


2 

hi 


inf \\M-M*f L2 , 
M*e P p " " L * 


where P p = span ({$ a } a6 iJ . 


(28) 


According to ??, the response surface in ?? minimizes the mean-square error over the space of polynomials 
P p = span({’I' Q ,} Q , e ^ p ) having a total degree of at most p. 


3.6 Least squares 


In order to find a metamodel of the form ??, one computes approximations of the exact expansion coefficients in 
??. Broadly speaking, one distinguishes between intrusive and non-intrusive computations. While the former 
class of techniques is based on manipulations of the governing equations, the latter is exclusively build upon 
calls to the forward model at chosen input values. Stochastic Galerkin methods belong to the class of intrusive 
techniques El EH], whereas stochastic collocation mm and projection through numerical quadrature mm 
are non-intrusive approaches. Herein we focus on another non-intrusive formulation that is based on least 
squares regression analysis |83L [S3]. This formulation is based on linear least squares mm and related ideas 
from statistical learning theory EH SO]- Since this includes sparsity-promoting fitting techniques from high¬ 
dimensional statistics [85] (86], recently least squares projection methods receive considerable attention. This 
includes frequentist [5TM0] and Bayesian implementations [THTfTT] of shrinkage estimators. Current results 
regarding the convergence behavior of such regression methods can be found in [f)5l (f)7| . 

We introduce a simplifying vector notation such that a = (ai,...,ap) T and SI/ = (I , i,...,fp) T gather 
and order the coefficients and the polynomials for all a £ A p . For the truncated expression in ?? one thus 
has A i p = a T SF. The problem of finding A4 P € P p that minimizes the mean-square error in ?? may then be 
equivalently rephrased as 


a = arg min E 

a*eR p 


(M(X) -a* T ^(X)Y 


(29) 


The stochastic optimization objective in ?? establishes an alternative to the orthogonal projection in ??. This 
formulation may be more amenable to a numerical computation. At the very least it allows one the draw on 
the machinery of linear least squares in order to compute an approximation a of the exact coefficients a. To 
that end one discretizes ?? as 


a = arg min — (m. {x^) — a* T At (x^ )\ . (30) 

K k =i V ' 

Here, the experimental design X = ( x 6),... , x^) is a representative sample of K forward model inputs, e.g. 
randomly drawn from the input distribution in ??. It is then required to compute the corresponding model 
responses y = (M (a^ 1 )), • • ■ ,M(x^ K ^)) T in K training runs. 

Now let A £ R KxP be the matrix with entries Akj = for k = 1,..., K and l = 1,..., P. Moreover, 

let the system y = Aa be overdetermined with K > P. The linear least squares problem in ?? may then be 
written as 

a = argmin||jy — Aa*|| 2 (31) 

The normal equations (A T A)a = A T y establish the first-order condition for ?? to apply. Given that the 
matrix A r A is non-singular, this linear system is solved by 


a = (A T A)- 1 A T y. (32) 

The positive definiteness of A r A, i.e. the columns of A are linearly independent, is the second-order condition 
for ?? to be the minimum of ??. The ordinary least squares (OLS) solution ?? is commonly computed by means 
of linear algebra methods. An alternative to OLS is least angle regression (LAR) |98] [M] ■ LAR is well-suited 
for high-dimensional PCE regression problems [57i and can be even applied in the underdetermined case. It 
is based on selecting only the most dominant regressors from a possibly large candidate set. The resulting 
predictor is thus sparse as compared to the OLS solution. 








3.7 Prediction errors 


After the computation of a metamodel, one typically wants to assess its prediction accuracy. Moreover, when 
a number of candidate surrogates is computed, one wants to compare their performances in order to eventually 
select the best. Hence, one needs to define an appropriate criterion that allows for an accurate and efficient 
quantification of the approximation errors. The natural measure of the mismatch between the forward model 
M and an approximation M. p is the generalization error Eq b n = E[(A4(W) — _M P (W)) 2 ]. This is exactly the 
error the minimization of which is posed by ??. Since it remains unknown, it cannot be used as a performance 
measure. One could estimate Ec en based on MC simulation, though. However, this is not very efficient 
since it requires the execution of additional forward model runs. In contrast, the empirical error f?E m p = 
K ~ 1 'Ek=i(-M(xW) - M p (x^)) 2 is the quantity that is practically minimized according to ??. This error 
indicator is obtained for free, however, it does not account for overfitting and thus tends to severely underestimate 
the real generalization error EQ en . 

In order to construct an estimate of Eq bii that is more efficient than the MC estimate and more accurate 
than i?Emp) one sometimes resorts to leave-one-out (LOO) cross validation [ ilOO j. Let Ai^ k be the surrogate 
model that is obtained from the reduced experimental design X^ k = (x 1 ' 1 ' 1 ,... ,x^ k ~ 1 \x^ k+1 \ ... ,x^) : i.e. a 
single input x^ has been dropped. The LOO error is then defined as 

. K 2 

E u>o = z'52(M(xW)-M„ k (xWy) . ( 33 ) 

k =1 


Without the need for re-running the forward model, this error allows for a fair assessment of how well the 
performance of a metamodel Ai p generalizes beyond the used experimental design. Yet, ?? calls for conducting 
K separate regressions for finding Ai^ k with an experimental design of the size K — 1. A remarkably simple 
result from linear regression analysis states that iAoo can be also computed as 


-Eloo 


1 

K 


E 


( M(x^) - M p {xW) 
y 1 - hk 


2 


(34) 


Here, A4 P is computed from the full experimental design X and h k denotes the k -th diagonal entry of the matrix 
A{A T A)~ l A T . This is more efficient since it does not require repeated fits. A derivation of the formula in ?? 
can be found in m- 

One may define eEmp = E'Emp/VarD)' 7 ] and cloo = -ELoo/Var)} 7 ] as normalized versions of the empirical 
and the LOO error, respectively. Here, Var[y] is the empirical variance of the response sample y. These 
normalized errors can be used in order to judge and compare the performance of metamodels. In turn, this 
enables a practical convergence analysis, e.g. by monitoring the errors over repeated metamodel computations 
for an increasing experimental design size K and expansion order p. 


4 Spectral Bayesian inference 

Different types of probability density approximations are encountered in statistical inference. This includes 
series expansions for the population density of a random data sample in nonparametric distribution fitting. 
Here, the unknown density of the data is either directly represented as a linear combination of polynomial 
basis functions m or as the product of a base density times a superposition of polynomials [103 . The latter 
type of expansion is also encountered in parametric density estimation of random data where one-dimensional 
posterior densities of the unknown parameter are expanded about a Gaussian baseline. This can be based 
on a Taylor sum at a maximum likelihood estimate mm or on Stein’s lemma and numerical integration 
fIMHI07| . Moreover, different types of likelihood approximations are encountered in inverse modeling. This 
includes direct approaches where the likelihood is approximated itself |1081 i ! 09 l and indirect methods where 
the likelihood is approximated based on a surrogate of the forward model |18H2iJ| . These techniques facilitate 
Bayesian inference within the limits of MCMC sampling. 

By linking likelihood approximations to density expansions, now we present a spectral formulation of 
Bayesian inference which targets the emulation of the posterior density. Based on the theoretical and computa¬ 
tional machinery of PCEs, the likelihood function itself is decomposed into polynomials that are orthogonal with 
respect to the prior distribution. This spectral likelihood expansion enables semi-analytic Bayesian inference. 
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Simple formulas are derived for the joint posterior density and its marginals. They are regarded as expansions 
of the posterior about the prior as the reference density. The model evidence is shown to be the coefficient of 
the constant expansion term. General Qol-expectations under the posterior and the first posterior moments 
are obtained through a mere postprocessing of the spectral coefficients. After a discussion of the advantages 
and shortcomings of the spectral method, a change of the reference density is proposed in order to improve the 
efficacy. 


4.1 Spectral likelihood expansions 

The authors C. Soize and R. Ghanem start their paper m with the following sentence: “Characterizing the 
membership of a mathematical function in the most suitable functional space is a critical step toward analyzing 
it and identifying sequences of efficient approximants to it.” As discussed in ??, given the data y and the 
statistical model f(y\x), the likelihood is the function 


C:V X ^ R+ 
x f(y\x). 


(35) 


It maps the parameter space V x into the set of non-negative real numbers R + . At this place we assume 
that the likelihood C £ L^(T> X ) is square integrable with respect to the prior. From a statistical point of 
view, this is a reasonable supposition which is necessary in order to invoke the theory of ??. On condition 
that the likelihood is bounded from above, the mean-square integrability follows immediately from the axioms 
of probability. Note that maximum likelihood estimates (MLE) implicitly rest upon this presumption. If 
®mle £ argmax^g^ C,(x) is a MLE, i.e. for all x £ T> x it applies that C(x) < L(x mle) < oo, then one trivially 
has f v C 2 (x) 7 t(x) dx < C 2 (x MLE ) f v tt{x) dx = C 2 (x MEE ) < oo. 

Having identified L^{T> X ) as a suitable function space for characterizing the likelihood, one can represent 
the likelihood with respect to the orthonormal basis { v k Q ,} a g ]N M. This representation is 


C = b a d! a , with 


lg]N M 


b a = (£, ^oc)li = J C{x)'d/ a (x)Tr{x)dx. 

’Da¬ 


rn 

(37) 


We refer to ???? as a spectral likelihood expansion (SLE). Notice that the SLE coefficients {5 a }aeiN M are data- 
dependent. This reflects the fact that the likelihood in ?? depends on the data. With the truncation scheme 
in ??, one can limit the infinite series in ?? to the finite number of terms for which a £ A p . A mean-square 
convergent response surface of the likelihood is then given as 

C p (x) = ^2 b a ^f a (x). (38) 

otCAp 


For the time being we assume that the coefficients of the SLE in ?? or its response surface in ?? are already 
known. One can then accomplish Bayesian inference by extracting the joint posterior density or a posterior 
density surrogate, its marginals and the corresponding Qol-expectations directly from the SLE. 


4.2 Joint posterior density 

We begin with the joint posterior density function and the model evidence. By plugging ?? in ?? one simply 
obtains the “nonparametric” expression 


n(x\y) 



(39) 


Due to the orthonormality of the basis, the model evidence is simply found as the coefficient of the constant 
SLE term. This is easily verified by writing ?? as 


Z={l,C) Ll 



(40) 
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The remarkably simple result in ?? completes the expression of the posterior density in ??. It is interesting to 
note that the posterior density is of the form 


n(x\y) = tt(x) I 1 + Y b 0 1 b a 't/ a (x) I . (41) 

\ aei M \{o) / 

In essence, the posterior density is here represented as a “perturbation” around the prior. The latter establishes 
the leading term of the expansion and acts as the reference density. The expression in ?? is reminiscent of an 
Edgeworth expansion for a density function in asymptotic statistics mum. 

4.3 Quantities of interest 

Based on the joint posterior density one can calculate the corresponding Qol-expectations in ??. At this point, 
the identities in ???? finally play a key role. They allow one to express and treat the posterior expectation of 
a Qol with h £ L^.(T> X ) as the weighted projection onto the likelihood 

Mh(X)\y] = [h(X)C(X)\ = i {h,C) Ll ■ (42) 

Let h = c a f a with c a = (/i,4' Q ,)i 2 be the Qol representation in the polynomial basis used. The 

general posterior expectation in ?? follows then from ?? by Parseval’s theorem 

nh(X)\y] = y / Y c a ^ ai Y = y Y (43) 

0 \aGlN M aGW" / L 2 0 aeIN M 

If the Qol h £ P p is known by a multivariate monomial representation of finite order, then its representation in 
the orthogonal basis can always be recovered by a change of basis. Examples that relate to the first posterior 
moments are given shortly hereafter. In a more complex case the Qol is a computational model itself and 
one would have to numerically compute a PCE surrogate. While ?? facilitates the propagation of the prior 
uncertainty, ?? promotes the propagation of the posterior uncertainty. 


4.4 Posterior marginals 

Now the posterior marginals in ?? are derived. For some j £ {1,..., M} let us introduce the new set of multi¬ 
indices = {(a 1 ,..., = 0 for i ^ j}. With a slight abuse of the notation, the sub-expansion of the 

SLE that only contains terms with a £ is denoted as 

£j{xj)= Y b o^<*{x) = Y 6 m ) ^ ) ( 2 : j)> where = 6 ( o,..,o, M> o,...,o)- (44) 

It collects all the polynomials that are constant in all variables x^j, i.e. they are non-constant in the single 
variable Xj only. In this sense the sub-expansion in ?? is virtually a function of Xj only. For the posterior 
marginal of the single unknown Xj in ?? one can derive 

Axj\y) = 7Tj( '^ J £{x) n{x^j) d x^j = ^Cjix^TTjixj). (45) 

Va. . 

These equalities apply due to the independent prior tt(x) = n(x r ^j)n.j(xj) in ??, the orthonormality of the 
univariate polynomials in ?? and the tensor structure of the multivariate ones in ??. For a pair j, k £ { 1 ,..., M} 
with j 7 ^ k let us introduce yet another set of multi-indices A^’ k ^ = {(aq,..., cum) \o.i = 0 for i ^ j , k}. The 
sub-expansion of the full SLE that only contains terms with a £ A^ ,kb is denoted as 

Cj, k (xj,x k )= Y b a d> a (x) = Y 

IN (46) 

where = b( o,...,o,/i,o,...,o,w,o,...,o)- 
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Since it only contains terms that are constant in x^jj-, the sub-expansion in ?? can be seen as a function of Xj 
and Xk■ The posterior density can then be marginalized as follows 


Tr(xj,x k \y)= J Tr(x\y)dx^^ k = —Cj tk (xj,x k )'Kj(xj)iT k (x k ). (47) 

Note that the dependency structure of n(xj,Xk\y ) in ?? is induced by those terms of Cj_ k (xj, x k ) that are not 
present in Cj(xj) and C k {x k ), he. the terms b^’v^^P {xj)^ k \x k ) with y, v ^ 0. 


4.5 First posterior moments 

With the two marginalizations of the posterior density in ???? one can calculate the entries of the posterior 
mean in ?? and the covariance matrix in ??. Let be defined such that Xj = d,Q^Q\xj)+d[^d>^\xj). 

With this univariate representation and ?? one easily obtains 

M x j\y\ = 7- ( x j,£j) L 2 = TT ( d o ) bo ) +d[ j) b[ ]) ) . (48) 

Note that one actually has bff^ = b 0 in this notation. Diagonal entries of the covariance matrix in ?? can 
be similarly deduced. Let {e^, e^\ e^} the coefficients of the univariate representation (Xj — E[X,-|y ]) 2 = 
J2 2 fi=o e M Then one simply has 

Var[^. \y\ = ^ ({xj - nXj\ y]f , C 3 ) ^ = £ £ e^bjfl (49) 


Finally, let {e^Q fc \ e.^ k \ efy k \ be the coefficients of the bivariate PCE with (xj— E[Ay \y])(x k — E[Xfc|y]) = 

v —o e ^v^^\ x j)^v\ x k)- For an off-diagonal entry of ?? one then finds 


Cov[Xj,X k \y\ 


1 


p U,k)h(j,k) 




(50) 


Notation-wise, ?????? may seem to be somewhat cumbersome. Nevertheless, they establish simple recipes of 
how to obtain the first posterior moments by a postprocessing of the low-degree SLE terms in closed-form. 
Higher-order moments could be obtained similarly. Some examples of how the corresponding Qols can be 
represented in terms of orthogonal polynomials can be found in the ??. 


4.6 Discussion of the advantages 

In spectral Bayesian inference the posterior is genuinely characterized through the SLE and its coefficients. 
The essential advantage of this approach is that all quantities of inferential relevance can be computed semi- 
analytically. Simple formulas for the joint posterior density and the model evidence emerge in ????. They 
allow to establish ?? as the posterior density surrogate. General Qol-expectations under the posterior are then 
calculated via Parseval’s formula in ??. The posterior marginals are obtained based on sub-expansions of the 
full SLE in ?? and the first posterior moments have closed-form expressions in ??????. These striking charac¬ 
teristics clearly distinguish spectral inference from integration and sampling approaches where the posterior is 
summarized by expected values or random draws only. As for the latter, one has to rely on kernel estimates of 
the posterior density and on empirical sample approximations of the Qol-expectations. Also, the model evidence 
is not computed explicitly. 

The practical computation of the SLE in ?? can be accomplished analogously to finding the PCE approxima¬ 
tion of the forward model in ??, e.g. by solving a linear least squares problem as in ??. This allows one to draw 
on the vast number of tools that were developed for carrying out this well-known type of regression analysis. An 
attractive feature of this procedure is that the prediction error of the obtained SLE acts as a natural convergence 
indicator. We recall that the LOO error in ?? can be efficiently evaluated as per ??, i.e. without the need for 
additional forward model runs or regression analyses. The existence of an intrinsic convergence criterion is an 
advantage over traditional MCMC techniques. Another advantage of the formulation its amenability to parallel 
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computations. While the workload posed by MCMC is inherently serial, running the forward model for each 
input in the experimental design is embarrassingly parallel. Parallelization is also possible on the level of the 
linear algebra operations that are necessary in order to solve the normal equations. 

4.7 Discussion of the shortcomings 

The approximate nature of SLE computations is twofold, i.e. only a finite number of terms are kept in the 
expansion and the coefficients are inexact. Unfortunately, a number of inconveniences may arise from these 
inevitable approximations. The SLE and the correspondingly computed posterior density could spuriously take 
on negative values. Also the estimated model evidence in ?? could take on negative values Z < 0. Still note 
that the approximate posterior in ?? always integrates to one. For reasonably adequate SLEs, we expect that 
negative values only occur in the distributional tails. Even so, the presence of negative values hampers the 
interpretation of the obtained posterior surrogate as a proper probability density, e.g. it leads to finite negative 
probabilities that are somehow irritating. From a more practical rather than a technical or philosophical 
perspective, densities are ultimately instrumental to the evaluation of more concrete quantities such as the 
expectations in ??. The severity of negative densities has thus to be judged with respect to the distortions 
of these relevant values. As long as their accurate approximation is guaranteed, the possibility of negative 
density values is an unavoidable artifact that can be regarded as a minor blemish. And the obtained surrogate 
density still proves to be expedient to effectively characterize the posterior distribution. In this light, it is 
more unpleasant that the a posteriori estimates of the model parameters may violate the restrictions that were 
imposed a priori. In ?? it could indeed happen that E[A ? \ y\ (j T> x . Estimations of the second order moments 
in ?? could result in unnatural values Var [Xj \ y] < 0, too. Although these problems cannot be remedied unless 
one solves an appropriately constrained version of ??, they unlikely occur if the SLE is sufficiently accurate. 
Anticipating outcomes from later numerical demonstrations, we remark that the occurrence of negative density 
values is observed, while unphysical or unnatural estimates of the first posterior moments are not found. 

The SLE decomposition into a globally smooth basis of tensorized polynomials suffers from some other 
intrinsic problems. Generally, there is the curse of dimensionality, i.e. the increase of the number of regressors in 
??. Furthermore, the SLE convergence rate in ?? depends on the regularity of the underlying likelihood function. 
For discontinuous forward or error models the SLE approximation with smooth polynomials converges only 
slowly. Likelihood functions often show a peaked structure around the posterior modes and a vanishing behavior 
elsewhere. Hence, any adequate superposition of polynomials has to capture those two different behavioral 
patterns through some kind of “constructive” and “destructive” interaction between its terms, respectively. Due 
to their global nature, the employed polynomial basis may not admit sparse likelihood representations. In turn, 
a high number of terms might be necessary in order to accurately represent even simple likelihood functions. 
Especially in the case of high-dimensional and unbounded parameter spaces this may cause severe practical 
problems. Of course, in ???? one could expand the likelihood in a different basis. Yet, note that the Qols in 
?? would also have to be expanded in that basis. 

The role of the prior for spectral inference is manifold. Initially the posterior expectations in ?? have been 
rewritten as the weighted prior expectations in ??. This formulation is accompanied by difficulties in comput¬ 
ing posteriors that strongly deviate from the prior. The same situation arises for crude MC integration and 
eventually motivates importance or MCMC sampling that allow to focus on localized regions of high posterior 
mass. Those difficulties become manifest if the prior acts as the sampling distribution for the experimental 
design. In this case, the SLE is only accurate over the regions of the parameter space that accumulate con¬ 
siderable shares of the total prior probability mass. Approximation errors of the SLE in regions that are less 
supported by the prior then induce errors in the computed posterior surrogate. Note that this difficulty is also 
encountered in MCMC posterior exploration with prior-based PCE metamodels. Also, the error estimate in ?? 
then only measures the SLE accuracy with respect to the prior which may be misleading for the assessment of 
the posterior accuracy. It is not clear how the errors of the likelihood expansion relate to the induced errors 
of the posterior surrogate and the posterior moments. Moreover, since the prior acts as the reference density 
of the posterior expansion in ??, the spectral SLE representation of significantly differing posteriors requires 
higher order corrections. Otherwise put, SLEs are expected to perform better for posteriors that only slightly 
update or perturb the prior. 


13 



4.8 Change of the reference density 

As just discussed, a major drawback of SLEs is their dependency on the prior ir as the reference density 
function. The errors are minimized and measured with respect to the prior and the posterior is represented 
as correction of the standard reference. In case that high-order corrections are required, SLEs also suffer 
from the curse of dimensionality. While fully maintaining the advantages of the spectral problem formulation, 
these shortcomings can be remedied through the introduction of an auxiliary density g over the prior support 
T> x that would optimally mimic the posterior in some sense. This reference density change allows for the 
construction of auxiliary expansions that are more accurate with respect to the posterior and for a more 
convenient series expansions of the joint posterior density. It is analogous to the adjustment of the integration 
weight in importance sampling, where the average over a distribution is replaced by a weighted average over 
another ancillary distribution. An iterative use of this reference change naturally allows for adaptive SLE 
approaches. 

Given that g[x) ^ 0 for all x £ T> x , one may define the auxiliary quantity Q = Cir/g. Under the additional 
assumption that Q £ L 2 g (fD x ), one can expand this quantity in terms of polynomials {4 r ® } a g H M that are 
orthogonal with respect to the auxiliary reference. Analogous to the expansion of C £ L 2 (V x ) in ????, this is 

r~ 

Q = — = E with 

^ a£l M 

b 9 a = {Q ,^ 9 a ) L 2 = j G{x)^ 9 a (x) g{x)dx = j C(x)^ 9 a (x) n(x) d® 

T> a 

The coefficients of this auxiliary SLE (aSLE) are denoted as {6® e } cte]N M. They equal the projections b 9 a = 
(£, 4'^)l 2 °f the likelihood onto the polynomials H/®,. Note that if the new reference g = n equals the prior, 
then Q = C is simply the likelihood and the formulation remains unchanged. If the density g = 7r(-1 y) equals 
the posterior, then the quantity Q = Z equals the model evidence. In this case the aSLE Q = = b 9 0 is 

a constant with a single nonzero term. If g ss 7r(-| y) only applies in an approximate sense, then one may still 
speculate that the aSLE is sparser than the corresponding SLE. 

As in importance sampling, one can then rewrite the expectation values under ir in ???? as expectations 
under g. Similar to ??, the model evidence then emerges again as the zeroth expansion term 

Z = J Q{x)g{x)dx = b 9 0 . (53) 

Let h = c a^a with c a = 4' o)l 2 be the auxiliary expansion of a Qol. Similar to ??, for general Qol 

posterior expectations one may then write 

HK X )\V\ = \ [ h{x)g{x)g{x)dx = ^ ( 54 ) 

^ 0 c*eiN M 

In accordance with the aSLE in ???? the joint density of the posterior distribution is obtained as the 
asymptotic series 

*(*| V) = G{X) ° {X) = I ( E b X(x)) 9(x) = 9(x) ( 1 + E ) • ( 55 ) 

0 VaelN" / \ aeIN"\{°} 0 J 

As opposed to ?? where the posterior density is represented around the prior n, in ?? the posterior is expanded 
about the new reference g. If the latter resembles the posterior adequately well, the formulation only calls for 
small corrections. 


(51) 

(52) 


5 Numerical examples 

Next, the potential and the difficulties of the theory presented in the preceding section are investigated. The 
goal is to give a proof of concept for the basic feasibility of spectral Bayesian inference. It is verified that the 
theory can be successfully applied in practice and further insight into its functioning is obtained. Moreover, 
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it is learned about its current shortcomings. Four instructive calibration problems from classical statistics 
and inverse modeling are solved for these purposes. The analysis is confined to problems with low-dimensional 
parameter spaces. First, the mean value of a normal distribution is inferred with random data under a conjugate 
normal prior. Second, the mean and standard deviation of a normal distribution are fitted for a joint prior with 
independent and uniform marginals. Third, an inverse heat conduction problem in two spatial dimensions with 
two unknowns is solved. Finally, a similar thermal problem with six unknowns is considered. Synthetically 
created pseudo-data are used in all these example applications. 

As it turns out, one can gain valuable insights into the characteristics of likelihood expansions and posterior 
emulators by way of comparison. Therefore, the analyses for the first three examples proceed analogously. 
For rich experimental designs, the convergence behavior of high-degree SLEs is studied by reference to the 
LOO error. More importantly, the capability of lower-degree SLEs to accurately capture the posterior Qol- 
expectations is explored for scarcer experimental designs. Eventually, aSLE-based posterior surrogates are 
investigated in order to mitigate the curse of dimensionality. All results are compared to reference solutions. 
Where possible, the exact solutions from a conjugate Bayesian analysis are used to this effect. Otherwise, 
corresponding approximations are computed via classical MCMC sampling. 

The uncertainty quantification platform UQLab [ 11411115] is used throughout the numerical demonstrations. 
It provides a flexible environment for the uncertainty analysis of engineering systems, e.g. for uncertainty 
propagation. In this context it ships with a range of regression tools that allow one to easily compute PCEs. 
These tools can be directly applied to the likelihood function in order to compute SLEs. OLS is employed as 
the standard solving routine in the following examples. 


5.1 ID normal fitting 

First of all, we consider the problem of fitting a Gaussian distribution N(jji\y,cr 2 ) to random realizations y, 
with i = 1,..., N. The goal is to estimate the unknown mean y whereas the standard deviation a is assumed 
to be already known. Given a Gaussian prior, this one-dimensional normal model with known variance exhibits 
a Gaussian posterior density. Moreover, a closed-form expression for the model evidence can be derived. Since 
this offers the possibility of comparing the SLE results with analytical solutions, this simple statistical model is 
used as a first SLE testbed. Let the data y = (j/i,..., 2 /at) t be comprised of N independent samples from the 
normal distribution. For the observational model one may then write 

N 

Y\n~ n J\f(yi\/i,(7 2 ), with known a 2 . (56) 

i—1 


Consequently, the likelihood function can be simply written as C(y) = Jlfli N(yi |/b cr 2 ). A Bayesian prior 
distribution 7r(/x) captures the epistemic uncertainty of the true value of y before the data analysis. For the 
posterior distribution, that aggregates the information about the unknown after the data have been analyzed, 
one then has n(y\y) = Z~ 1 C(y)n(y). 

The conjugate prior for the data model in ?? is a Gaussian 7r(/i) = Af(y\yo, Cg). Its mean yo = E[/i] 
and variance a 2 = Var[/z] have to be conveniently specified by the experimenter and data analyst. This prior 
choice ensures that the posterior is a Gaussian 7r(/x| y) = Af(y\yi\r,cr^) whose parameters y^ = E[/z|y] and 
a 2 N = Var[p|y] are easily found as 


M.iv 





(57) 


Here, y = N 1 Vi ' s the empirical sample mean of the data. Likewise, an explicit expression for the 
model evidence Z = f M {Yl I ^ =1 J^f(yi\y,cr 2 ))J\f(y\y 0 ,a'Q)dy can be derived. Let y 2 = N _1 YliLiVi denote the 
sample mean of the squared observations. A straightforward calculation based on simple algebra and a Gaussian 
integral then yields 


Z = <j. 


■'( 


— N 


<7v27T 







(58) 


For the following computer experiment, the parameters of the data distribution in ?? are specified as 
y = 10 and a = 5, respectively. In the course of the procedure only the mean is treated as an unknown, 
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whereas the standard deviation is assumed to be known. We consider a situation where N = 10 samples are 
randomly drawn from the data distribution. For the numerical experiment, the pseudo-random numbers y = 
(8.78,4.05,12.58, 3.60,11.05, 8.70, 20.80,1.23,19.36,12.07) t are used as synthetic data. The prior distribution 
is set to be a Gaussian n (p) = J\f(p\p 0 , ctq) with p 0 = 11.5 and a 0 = 1.5. 

5.1.1 Posterior density 

In order to better understand the principles of spectral Bayesian inference we now proceed as follows. Spectral 
expansions C p of the likelihood function C defined above are computed and compared for experimental designs 
of varying size K and polynomial terms of varying degree p. Hermite polynomials are used in combination 
with an appropriate linear transformation to standardized variables £ R with a Gaussian weight function 
A7 (£ m | 0, 1). Accordingly, the unknown can be represented as p = po + ooCm- The experimental designs are 
one-dimensional Sobol sequences that are appropriately transformed. 

First the convergence behavior and the accuracy of the likelihood approximation are analyzed. For a rich 
experimental design with K = 5 x 10 4 , SLEs are computed for an increasing order up to p = 20. The 
normalized empirical error EEmp and the normalized LOO error eloo are monitored over these computations. 
While the former can be directly computed according to its definition, the computation of the latter relies on 
the reformulation in ??. This serves the purpose of assessing the prediction accuracy of the computed SLE as 
a function of the degree p. The results are plotted in ??. It can be seen how the error estimates approach zero, 
i.e. the SLE converges to the likelihood function. For p = 20 the empirical error amounts to tEmp = 1-05 x 10~ 12 
and the LOO amounts to eloo = 1-82 x 10“ 10 . These small error magnitudes show that the likelihood function 
C can be indeed spectrally expanded in a Hermite basis. 



degree 


Figure 1: ID normal fitting: Convergence of the SLE. 

The functional likelihood approximation C p provided by the most accurate SLE with p = 20 is visualized 
in ??. Moreover, the plot shows a low-order SLE with p = 5 and K = lx 10 2 for which the error estimates 
EEmp = 2.61 x 10~ 4 and eloo = 8.41 x 10 -4 are obtained. For the sake of comparison the exact likelihood 
function C is shown as well. It can be seen that the SLEs are able to accurately represent the likelihood around 
its peak, i.e. roughly speaking in the interval p £ [8,15] for p = 5 and in p £ [5,18] for p = 20. Note that these 
regions accumulate the largest proportions of the total prior probability mass. Outside of these ranges, however, 
the SLEs £ p start strongly deviating from C and taking on negative values. These phenomena can be attributed 
to an imperfect polynomial cancellation of the finite series approximation of the likelihood in the regions of the 
parameter space that are only sparsely covered by the experimental design. Indeed, for unbounded parameter 
spaces it is clearly hopeless to achieve a global net cancellation of a finite polynomial expansion that is necessary 
in order to emulate the vanishing behavior of the likelihood far from its peaks. The extent to which this impacts 
on the approximation of the posterior density and its first moments is investigated next. 

Expanding the likelihood function is only a means to the end of surrogating the posterior density. Ap¬ 
proximations of the posterior density n(p\y) ss 1 £ p (p)w(p) are computed from the SLEs with p = 5 and 
p = 20 through ????. The results are plotted in ??. In addition to the SLE approximations, the prior density 
tt(p) = Af(p\p 0 , Ctq) and the exact solution n(p\y) = Af{p\p n, a%) from a conjugate analysis based on ?? are 
shown. The posterior surrogate for p = 5 shows minor deviations from the the analytical result, while the 
approximation for p = 20 perfectly matches the true density. It is noted that the discrepancies between C p and 
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C shown in ?? are attenuated. The underlying reason is that for large enough \p\ —► oo the exponential decay 
of the Gaussian prior 7 r(p) oc exp (—(p — po) 2 ) dominates the polynomial increase of C p (p) = 
in the sense that C p (p)Tr(p) —> 0. This absorbs the effects of the SLE approximation that is increasingly inad¬ 
equate for large values of |/i|. In this sense, the prior reference density guards the posterior surrogate against 
the inadequacies of the SLE. Therefore, the posterior emulation may very well be more accurate than the SLE 
approximation of the likelihood. 



Figure 2: ID normal fitting: Likelihood function. 



Figure 3: ID normal fitting: Posterior density. 


5.1.2 Quantities of interest 

Commonly one employs posterior means as parameter estimates and posterior standard deviations as measures 
of the estimation uncertainty. In order to investigate how well one can approximate the model evidence together 
with these meaningful quantities in spectral Bayesian inference, SLEs are computed for experimental designs 
of varying size K and for a selection of expansion orders p. The corresponding SLE-based approximations of 
the model evidence Z, the posterior mean p^ and the standard deviation a n are then computed from ??????. 
Note that the effects of the transformation to standard variables have to be appropriately taken care of at this 
place. This happens via ??. The SLE approximations can then be compared to the analytical solutions that 
are obtained from the conjugate analysis in ????. In ?? the results of this procedure are summarized. Note 
that all the SLE estimates attain admissible values, e.g. the model evidence is non-negative. Furthermore, it is 
noticed that Z , p^ and a jv can be recovered with high accuracy even for very scarce experimental designs and 
low-order SLEs, say for K = lx 10 3 and p = 10. It is concluded that, in some sense, the accurate estimation 
of the model evidence and the first posterior moments require significantly less computational effort than the 
accurate estimation of the posterior density. 

Table 2: ID normal fitting: Statistical quantities. 



I< 

V 

£loo 

Z [10 -15 ] 

Pn 

CTjv 


1 x 10 2 

5 

8.41 x 10" 4 

3.71 

10.85 

0.92 


5 x 10 2 

8 

2.49 x 10" 4 

3.75 

10.91 

1.14 

H 

1 x 10 3 

10 

2.58 x 10" 5 

3.74 

10.90 

1.07 

h-1 

CO 

5 x 10 3 

12 

8.21 x 10" 6 

3.74 

10.89 

1.09 


1 x 10 4 

15 

3.84 x 10" 7 

3.74 

10.89 

1.09 


5 x 10 4 

20 

1.82 x 10" 10 

3.74 

10.89 

1.09 


Exact results 

3.74 

10.89 

1.09 


5.2 2D normal fitting 

Next, we consider the problem of fitting both the unknown mean p and the standard deviation cr of a Gaussian 
distribution N(jji \p, a 2 ). A number of independent samples yi with i = 1,..., N from the normal distribution 
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constitute the available data y = ( 2 / 1 ,, Vn) T ■ The data model for this situation is written as 

N 

Y\y,cr ~ \y, cr 2 ). (59) 

2=1 

For the likelihood function one then has C(y,cr) = Ilfli-V(?/i|/r,cr 2 ). Given a Bayesian prior n(y,a), the 
posterior distribution is 7r(/z, cr\y) = Z~ 1 £(y, a)n(y, a). This distribution aggregates the information about the 
two unknowns after the data have been analyzed. 

The true values of the mean and standard deviation are set as y = 30 and er = 5, respectively. These 
values are treated as unknowns in the further course of the computer experiment. We consider a situation 
where N = 10 samples are randomly drawn from the distribution in ??. The pseudo-random numbers y = 
(31.23, 27.50, 24.91, 25.99, 32.88, 36.41, 27.81, 25.19, 37.96, 34.84) T are used as synthetic data. We consider an 
independent prior n(y,a) = ir(y)ir(a) with uniform marginals n(y) = U{y\y,y) and 7r(cr) = U{a\g_,a) over 
bounded supports = [y,y] = [20,40] and = [er, a] = [2, 10 ]. As opposed to the conjugate example above, 
this two-dimensional model does not permit a closed-form expression of the posterior density and the model 
evidence. 

5.2.1 Posterior density 

Now we proceed analogously to the investigation of the normal model with known variance. Expansions C p 
of the likelihood C are computed and contrasted for different experimental designs of size K and different 
polynomial orders p. An appropriate linear transformation to uniform standardized variables is applied such 
that the unknowns are represented as y = (y — y)/2 ■ + (y + y)/2 and a = (a — a)/2 ■ + (a + cf)/2, 

respectively. Here, £^,£ 0 - £ [— 1,1] are the corresponding standardized variables with a uniform weight function. 
Accordingly, tensorized Legendre polynomials form the trial basis. Two-dimensional Sobol sequences are utilized 
as uniformly space-filling experimental designs. 

As before, the speed of convergence and the prediction accuracy of the SLE are analyzed first. The normalized 
empirical error CEmp and the normalized LOO error £loo are therefore monitored throughout a series of runs 
that are conducted for an experimental design of the fixed size K = 1 x 10 5 and for an increasing expansion 
order up to p = 50. In ?? a corresponding plot is shown, where the convergence of the SLE C p to the likelihood 
function C is diagnosed. The reason that CE m p and eloo do not significantly differ is that the large size of the 
experimental design prevents overfitting. For p = 50 the normalized empirical error and the normalized LOO 
error are found as EEmp = 5.56 x 10~ n and eloo = 6.05 x 10 — 11 , respectively. This shows that the likelihood 
function C can be indeed expanded in the Legendre basis. For the uniform prior distribution that is used here, 
the normalized SLE errors effectively measure the errors of the posterior density. 



Figure 4: 2D normal fitting: Convergence of the SLE. 


Now the joint posterior density n(y,a\y) is computed and plotted in ??. For comparison purposes the 
posterior is sampled by means of MCMC simulation first. A simple random walk Metropolis (RWM) sampler 
with a Gaussian instrumental distribution is utilized. With this algorithm an unusually large number of 10' 
MCMC samples is drawn from the posterior. This serves the purpose of providing very accurate results that act 
as references for the SLE-based estimates. In ?? a normalized histogram of the obtained RWM sample is shown. 
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Next, the joint posterior density n(fi,a\y) ~ 1 C p {p, <r)n{p, a) is computed via ????. The SLE £ p (y,cr) with 
p = 50 that features the lowest LOO error is used. In ?? the posterior surrogate that arises from the SLE is 
plotted. For a later comparison with the heat conduction example, in ?? the SLE posterior surrogate from ?? is 
plotted again from a different angle. By visual inspection the obvious similarity between the density 7r(y, ajy) 
sampled by MCMC and emulated by the SLE is noticed. 



(a) MCMC reference sample. 


(b) SLE with p = 50. 



(c) SLE with p = 50. 


Figure 5: 2D normal fitting: Joint posterior. 


Now the posterior marginals 7r(y|y) and 7r(<r|y) are computed from the joint posterior. On the one hand, 
samples from the posterior marginals are obtained by restricting the analysis to the corresponding components of 
the joint MCMC sample. On the other hand, functional approximations of the posterior marginals are extracted 
based on sub-expansions £ PlP (y) and £ CT , p (er) of a joint SLE £ p (y, a) as in ????. For the SLEs with p = 9 and 
p = 50 the results are visualized in ??. Histogram-based MCMC sample representations and functional SLE 
approximations of the marginal densities are shown, too. As it can be seen, the marginal posteriors as obtained 
by MCMC and the SLE with p = 50 exactly match each other. For p = 9 the posteriors marginals display some 
wavelike fluctuations in their tails. 


5.2.2 Quantities of interest 

Since the posterior density itself is of little inferential use, the model evidence and the first posterior moments 
are computed for a selection of SLEs with varying size of the experimental design K and degree p. According 
to ????????, the SLE estimates of these quantities are obtained from the expansions coefficients. In ?? a 
summary of the results is given. Compliant with ?? the SLE estimates of the model evidence Z are obtained 
as the coefficient of the constant expansion term. According to ????, the SLE estimates of the posterior 
mean E[y|y] and the standard deviation Std[y|y] = Varfyjy] 1 / 2 of the location parameter y are computed. 
Likewise, the corresponding estimates for the spread parameter cr follow through a simple postprocessing of 
the low-order expansion coefficients. The SLE estimates of the linear coefficient of correlation p[y, <r|y] = 
Cov[y, ojy]/Std[y |y]/Std[ojy] are computed based on ??. Additionally, the LOO error £loo is listed to indicate 
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(b) Standard deviation a. 


Figure 6: 2D normal fitting: Posterior marginals. 


the SLE prediction accuracy. Note that all those estimates comply with the natural bounds and restrictions of 
the estimated quantities, e.g. the posterior means comply with the prior bounds. 

For the sake of comparison, associated results are listed for the simulated MCMC sample, too. These MCMC 
results are simply obtained as the corresponding sample approximations. The reference estimate of the model 
evidence is obtained by crude MC simulation instead, i.e. the arithmetic mean of the likelihood is computed for 
a number of 10 8 independent draws from the prior. It is interesting to note that the SLEs can reproduce the 
MCMC results for moderate experimental designs and degrees, say for K = 5 x 10 3 and p = 21. Even though 
a large number of input samples and a large polynomial degree is necessary to reproduce the shape of the joint 
posterior density, significantly smaller experimental designs and polynomial orders suffice to reproduce the first 
posterior moments. 


Table 3: 2D normal fitting: Statistical quantities. 




K 

P 

eLOO 


z [io- 14 ] 

E[y|y] 

E[cr|y] 

Std[/x|y] 

Std[<r|y] 

p[p,°\y] 


5 

X 

10 2 

5 

4.24 

X 

10- 

-1 

1.19 

30.34 

5.57 

2.03 

1.39 

0.18 


1 

X 

10 3 

9 

1.19 

X 

io- 

-1 

1.20 

30.39 

5.54 

2.01 

1.41 

0.08 

w 

5 

X 

10 3 

21 

9.64 

X 

io- 

-4 

1.18 

30.48 

5.56 

1.79 

1.38 

-0.01 

h-1 

CO 

1 

X 

10 4 

32 

5.86 

X 

io- 

-6 

1.18 

30.47 

5.56 

1.81 

1.38 

0.00 


5 

X 

10 4 

45 

1.30 

X 

io- 

-9 

1.18 

30.47 

5.56 

1.81 

1.38 

-0.00 


1 

X 

10 5 

50 

6.05 

X 

io- 

-11 

1.18 

30.47 

5.56 

1.81 

1.38 

-0.00 




(MC)MC 




1.18 

30.47 

5.56 

1.81 

1.38 

-0.00 


5.3 2D inverse heat conduction 

Finally, an inverse heat conduction problem (IHCP) is considered. The heat equation is a partial differential 
equation (PDE) that describes the distribution and evolution of heat in a system where conduction is the 
dominant mode of heat transfer. We consider a stationary heat equation of the form 

V • (kVT) = 0. (60) 

The temperature is denoted as T and the thermal conductivity is denoted as k. Commonly one is interested in 
the solution of the boundary value problem that is posed when ?? is satisfied over a physical domain subject 
to appropriate boundary conditions. We consider the steady state situation in two spatial dimensions. The 
Euclidean coordinate vector is denoted as r = (ri,r 2 ) T in the following. 

It is dealt with the identification of thermal conductivities of inclusions in a composite material with close- 
to-surface measurements of the temperature. The setup of the simplified thermal problem is visualized in ??. 
The thermal conductivity of the background matrix is denoted as kq, while the conductivities of the material 
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inclusions are termed as k\ and k 2 , respectively. It is assumed that the material properties are not subject to a 
further spatial variability. At the “top” of the domain a Dirichlet boundary condition Tj is imposed, while at the 
“bottom” the Neumann boundary condition <? 2 = — kq dT/dr 2 is imposed. Zero heat flux conditions dT/dr 1 = 0 
are imposed at the “left” and “right” hand side. 

We consider the IHCP that is posed when the thermal conductivities k = (ki,k 2 ) t are unknown and 
their inference is intended. With this in mind, a number of N measurements T = (T^Tq),..., T(rjv)) T of 
the temperature field at the measurement locations (rq,..., rqv) is available. The forward model A1: k 4 T 
establishes the connection between the data and the unknowns. It formalizes the operation of solving ?? for T as 
a function of k. Measured temperatures T = T + e consist of the corresponding model response T = A4(n) and 
a residual term e. The latter accounts for measurement uncertainty and forward model inadequacy. We consider 
residuals that are distributed according to a Gaussian A/”(e|0, S) with covariance matrix S. In compliance with 
?? the likelihood function is given as jC(k) = Af(T \£). Provided that a prior distribution 7 t(k) can be 
elicited, the posterior is given as 7 t(k|T) = Z~ 1 C{k)7t(k). 

The thermal conductivity of the background matrix is set to kq = 15W/m/K, while the thermal con¬ 
ductivities of the inclusions are specified as «q = 32W/m/K and k\ = 28W/m/K. The material properties 
of the inclusions are treated as unknowns subsequently. Moreover, the boundary conditions Tj = 200 K and 
q 2 = 2000 W/m 2 are imposed. A finite element (FE) model is used to solve a weak form of the governing 
PDE. The FE solution for the experimental setup described above is shown in ??. We consider a uniform prior 
distribution 7 t(k) = with independent marginals 7 t(ki) = U(k\ |k 1; Tci) and 7 t(k 2 ) = U(k 2 |k 2 , k 2 ). 

The prior bounds are chosen as wq = /c 2 = 20W/m/K and kq = k 2 = 40W/m/K, respectively. A number 
of N = 12 close-to-surface observations is analyzed. Their measurement locations are indicated by the black 
dots in ??. Independent Gaussian measurement noise with S = er£l and <jt = 0.25 K is considered. Based on 
this setup, synthetic data are simulated for conducting the computer experiment. This means that the forward 
model responses T for the true parameter setup are computed and pseudo-random noise is added in order to 
obtain T. 


Tj 



Figure 7: 2D IHCP: Heat conduction setup. 
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Figure 8: 2D IHCP: Steady state solution. 


5.3.1 Posterior density 

The analyses proceed analogously to the preceding section. By comparing the present IHCP and the non¬ 
conjugate Gaussian example, that have a two-dimensional parameter space and uniform priors in common, one 
can gain interesting insight into spectral Bayesian inference. First, the convergence behavior of the SLE is 
investigated. Spectral expansions C p of the likelihood C are therefore computed for an experimental design 
of size K = lx 10 5 and candidate bases with polynomials up to degree p = 50. All practical issues are 
handled analogously to the procedure in the non-conjugate Gaussian example. In ?? the normalized versions 
of the empirical error eEmp & n d the LOO error £loo are shown as a function of p. Comparing these results 
to ?? reveals that the convergence rate of the SLE C p is considerably slower than the corresponding one for 
the Gaussian example. For the SLE with p = 50 the error estimates amount to fEmp = 6.26 x ICC 4 and 
6loo = 7.56 x 1(T 4 . These errors are around seven orders of magnitude higher than the errors observed for the 
Gaussian example. The difference in the SLE convergence rate presumably originates from a difference in the 
underlying likelihood functions and posterior densities. This is now investigated in more detail. 
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Figure 9: 2D IHCP: Convergence of the SLE. 


A RWM approximation with 10' samples and the SLE-based emulation with p = 50 of the posterior density 
7r(«i, K 2 1 T) ~ &q 1 C p {k\, K2)7t(ki, K 2 ) are depicted in ??. In order to reduce the numerical cost of MCMC 
sampling, the FE model A4 is replaced by a PCE surrogate M p . For i = 1 separate PCEs Mi tP of 

the temperature Xj = AAi lP (ni, K 2 ) at the location are fitted as a function of the unknown conductivities. 
After an appropriate transformation to standardized variables, tensorized Legendre polynomials up to degree 
p = 10 act as the trial basis. Based on an experimental design of the size K = 10 3 , the LOO errors of the 
regressions amount to about £loo ~ 10 -10 . Accordingly, the PCE is considered an adequate replacement of 
the full FE model. Note that it would be also possible to use A4 P as a forward model surrogate during the 
likelihood training runs. 

The posteriors in ?? can be compared to the posteriors of the Gaussian example in ?? of the previous section. 
Relative to the respective prior, the posterior of the thermal problem 7 t(ki,K 2|T) contains more information 
than the posterior of the normal problem 7r(/z, cr|y), i.e. the likelihood C(k 1 , K 2 ) has a slightly more peaked and 
localized structure than £(y, a). In order to capture these different behaviors nearby and far from the posterior 
mode, the SLEs C p (n 1 , K 2 ) and X p (/r, &) require a different number of expansions terms. The more localized the 
posterior modes are with respect to the prior, the more terms are required in order to achieve the cancellation 
in the tails. Moreover, as opposed to 7r(/q a\y) the posterior n(ni, k, 2 \T) exhibits a pronounced correlation 
structure. In turn, this requires non-vanishing interaction terms. As a consequence, the SLE C p (k\,K 2 ) of the 
IHCP example is less accurate than the SLE C p (y,a) of the Gaussian example. This is also reflected in the 
fact that the posterior surrogate fluctuates and takes on negative values around the points an d [^ 1 ,^ 2 ]. 

In order to see this more clearly, the SLE posterior surrogate from ?? is plotted again from a different angle 
in ??. A small wavelike posterior structure spans the parameter space between these corners. These artifacts 
stem from an imperfect polynomial cancellation of the finite series approximation. This stands in contrast to 
the posterior of the Gaussian example in ?? where these phenomena were not observed. 

Via ?? the posterior marginals tt(ki\T) and tt(k 2 \T) can be extracted from the joint SLEs. The resulting 
densities are shown in ?? together with a histogram-based MCMC sample representation. As it can be seen, 
for p = 50 the marginals are captured fairly well, while the moderate-order surrogate for p = 21 still exhibits 
discrepancies at the bounds of the parameter space. The approximation of the posterior marginals by sub-SLEs 
seems to be more accurate, at least in the sense of the maximum deviation, than the approximation of the joint 
posterior 7 t(ki, K 2 \T) by the full SLE in ??. This phenomenon can be explained through the absence of all 
non-constant polynomial terms in the variables that are marginalized out. 

5.3.2 Quantities of interest 

Now we investigate how well one can extract the statistically interesting quantities. Results from SLEs with 
varying K and p are compared with the results from MCMC sampling. A summary of the findings is provided in 
??. The LOO error £loo of various SLEs is shown together with some basic posterior characteristics obtained 
by a postprocessing of the SLE coefficients. For j = 1,2 the posterior mean E[Kj |T] and the standard deviation 
Std[Kj|T] = Var[Kj IT] 1 / 2 of the posterior distribution are given in physical units ofW/m/K. In addition, the 
model evidence Z and the linear coefficient of correlation p[n 1 , K 2 \T] = Cov[ki, K 2 |T]/Std[Ki |T]/Std[/t 2 |T] are 
specified. In comparison to ??, where the results for the non-conjugate normal example are listed, the SLE 
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probability density k 2 [W/m/K] 



fti [W/m/K] 

(a) MCMC reference sample. 



fti [W/m/K] 

(b) SLE with p = 50. 



(c) SLE with p = 50. 


Figure 10: 2D IHCP: Joint posterior. 




Figure 11: 2D IHCP: Posterior marginals. 
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results for the IHCP match their MCMC counterparts less accurately. Nevertheless, it can be observed that the 
lowest-degree quantities of inferential interest can be extracted with a comparably small experimental design 
and relatively low number of regressors, say with I\ = lx 10 4 and p = 29. Note that all the estimates attain 
admissible values. 


Table 4: 2D IHCP: Statistical quantities. 



I\ 

V 

TOO 

z [io- 1 ] 

E[/si|T] 

E[k 2 | T] 

Std[«i |T] 

Std[fv2 T] 

p[ki,k 2 \T] 


5 x 10 2 

5 

8.24 x 10 _1 

8.45 

31.33 

28.36 

1.74 

1.33 

0.28 


1 x 10 3 

9 

6.08 x 10" 1 

7.81 

31.40 

28.22 

2.02 

1.53 

0.15 

w 

5 x 10 3 

21 

1.50 x 10” 1 

7.47 

31.32 

28.13 

2.16 

1.61 

0.34 

m 

1 x 10 4 

29 

5.79 x 10 -2 

7.21 

31.56 

28.30 

1.61 

1.39 

-0.05 


5 x 10 4 

35 

1.63 x 10" 2 

7.18 

31.62 

28.34 

1.24 

1.08 

-0.75 


1 x 10 5 

50 

7.56 x 10 -4 

7.18 

31.62 

28.33 

1.26 

1.10 

-0.68 


(MC)MC 

7.17 

31.62 

28.33 

1.26 

1.09 

-0.68 


5.4 6D inverse heat conduction 

In the previous sections it was demonstrated that likelihood functions can be indeed spectrally expanded and 
that the posterior density with its moments can be computed accordingly. For low-dimensional problems the 
SLE convergence behavior up to a high degree was studied by monitoring the LOO error. It was shown that the 
expansion error can be arbitrarily reduced by increasing the order of the expansion and adding samples to the 
experimental design. While this is reassuring to know, it does not help in solving higher-dimensional problems 
for which the computation of high-order expansions is exacerbated by the curse of dimensionality. Hence, now 
we want to investigate the applicability of SLEs and aSLEs in an inverse problem of moderate dimension. 

An IHCP in two spatial dimensions with six unknown conductivities is considered in this section. The 
setup of the problem is shown in ??. The M = 6 unknown conductivities k = (ki, ..., K6) T are inferred with 
N = 20 noisy measurements T = (Ti,...,T 2 o) T of the temperature field T. We set k 0 = 30W/m/K and 
k = (20, 24,..., 40) t W/m/K. The prior is set to a multivariate lognormal distribution 7 t(k) = JI® * ir(m) with 
independent marginals n(ni) = CAf(Ki\p 0 : a o) with p, 0 = 30 W/m/K and a 0 = 6 W/m/K. These parameters 
describe the mean po = lE[«i] and standard deviation ciq = Std[Ki] of the lognormal prior. They are related 
to the parameters of the associated normal distribution A/"(log(/Ci)|Ao,?o) y ia Mo = exp(Ao + Co/2) and ctq = 
(exp(cg) —1) exp(2Ao+Co)- Otherwise than that, the problem setup is exactly as described in the previous section, 
i.e. the likelihood function is given as C(k) = J\f(T\A4 (k), S). In accordance with this setup, in the following 
synthetic data are simulated and analyzed in order to compute the joint posterior 7 t(k|T) = Z^ 1 JZ(k)tt(k). 


T\ 



Figure 12: 6D IHCP: Heat conduction setup. 
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5.4.1 Posterior density 

The unknowns are represented as Ki = exp(Ao+Co£?) in terms of the standardized variables £ R with Gaussian 
weight functions J\f (Ci | 0,1). A spectral expansion C p in tensorized Hermite polynomials is then computed for 
p = 5 and K = 5 x 10 4 . The errors of the likelihood approximation are estimated as eEmp = 8.81 x 10” 1 
and £loo = 9.14 x 10 ” 1 . As compared to the low-dimensional examples that were studied before, these are 
large errors. An auxiliary reference density g(n) = f| (=1 g(Ki) is then constructed as a multivariate lognormal 
with independent marginals g(ni) = CJ\f(Ki\pi,af). The parameters of the latter are chosen as the means 
Hi = E[k,; |T] and standard deviations <Ji = Std[K, |T] of the posterior surrogate corresponding to the coefficients 
of SLE C p . We remark that this is a simple two-step procedure and that a more refined usage of the reference 
change would certainly lead to more sophisticated approaches. Subsequently, an aSLE Q p with p = 5 and 
K = 5 x 10 4 is computed. The errors amount to eEmp = 4.81 x 10“ 1 and £loo = 6.24 x 10” 4 . Notwithstanding 
that these errors are smaller than the corresponding errors of the SLE, they are still large as compared to the 
previous examples. Since these errors are now measured with respect to the auxiliary density which is expectedly 
closer to the true posterior than the prior is, the aSLE presumably leads to a more accurate posterior surrogate. 

From the previously computed SLE C(k) and the aSLE G p (k) approximations of the joint posterior density 
are computed via ????. The obtained surrogates tt(k\T) « C p (k)tt(k)/ b 0 and tt(k\T) ss G P {K)g(K)/b 9 0 are 
now compared to each other. We start with the one-dimensional marginals that can be compiled by collecting 
terms from the full expansions based on ??. For j = 1,..., 6 the marginals 7 \T) that are extracted that 
way are shown in ??. The marginal priors Tr(nj) and the auxiliary densities g(ftj) are shown, too. While the 
marginals that are taken from the SLE slightly deviate from their MCMC counterparts, the marginals based on 
the aSLE match their references perfectly well. The reason is that the posterior can be easier represented as a 
small adjustment of the auxiliary density than as a large correction to the prior. Thus, with the same expansion 
order the posterior is more accurately represented through the aSLE than through the SLE. Regarding the 
size of the error estimates, it is surprising that the marginals can be retrieved that well with the aSLE. Even 
though the SLE-based posterior approximations can hardly be interpreted as proper probability densities, i.e. 
they conspicuously take on negative values, the moments are recovered sufficiently well for the construction of 
the auxiliary reference density. 

On the basis of ?? the two-dimensional posterior marginals Tr(nj, Kk\T) can be constructed from the full 
expansions. For j = 3 and k = 4 the posterior marginal for the SLE C p is shown in ??. The same two- 
dimensional distribution is depicted in ?? for the aSLE G P ■ A histogram of the MCMC sample is provided in 
?? as a reference. As already found in ???? for instance, in ?? the aSLE-based surrogate appears to be almost 
exact whereas the SLE-based one is flattened out. Since the aSLE captures the true posterior density more 
accurately than the SLE, we expect similar findings for the posterior moments. 

5.4.2 Quantities of interest 

Finally we compute the model evidence and the first posterior moments with the aid of ???? and ??????. For 
the aSLE G P the analysis proceeds analogously to the SLE C p . In ?? a summary of the results is given. As 
it can be taken from the table, the aSLE consistently gives more accurate estimates of the reference values. 
This fulfills our earlier expectations. Regarding the inaccuracy of the SLE-based posterior marginals and 
the concerns about interpreting them as probability densities, the quality of the SLE-based estimates of the 
moments surpasses our expectations. In particular, the estimated standard deviations are more accurate than 
the surrogate marginals suggest, e.g. the ones shown in ????. Similar as for the posterior density, we have to 
conclude that the normalized LOO error does not give conclusive information about the accuracy of the first 
posterior moments. Nevertheless, it is remarked that the use of resampling methods still ensures a robust fit, 
i.e. it protects against overfitting. 

6 Concluding remarks 

A spectral approach to Bayesian inference that focuses on the surrogate modeling of the posterior density was 
devised. The likelihood was expanded in terms of polynomials that are orthogonal with respect to the prior 
weight. Ensuing from this spectral likelihood expansion (SLE), the joint posterior density was expressed as 
the prior that acts the reference density times a polynomial correction term. The normalization factor of the 
posterior emerged as the zeroth SLE coefficient and the posterior marginals were shown to be easily accessible 
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Figure 13: 6D IHCP: Posterior marginals. 
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(a) SLE with p = 5. (b) aSLE with p = 5. 



(c) MCMC reference sample. 


Figure 14: 6D IHCP: Posterior marginals. 


Table 5: 6D IHCP: Statistical quantities. 



z [nr 3 ] 

E[ki|T] 

E[/ 62 | T] 

e[k 3 | T] 

E[k 4 | T] 

E[/e s | T\ 

E[«e| T] 

SLE 

4.04 

21.42 

24.86 

28.79 

28.45 

34.43 

37.27 

aSLE 

3.68 

21.53 

24.48 

29.16 

28.57 

34.59 

36.95 

(MC)MC 

3.65 

21.52 

24.57 

29.11 

28.56 

34.64 

37.00 


Std[«i|T] 

Std[ K 2 |T] 

Std [« 3 |T] 

Std[/v 4 |T] 

Std[re 5 |T] 

Std[/t 6 T] 

p[k 1 ,k 2 \T\ 

SLE 

1.95 

3.43 

2.63 

2.43 

3.96 

3.13 

-0.40 

aSLE 

1.94 

3.56 

2.61 

2.33 

3.62 

2.99 

-0.44 

(MC)MC 

1.93 

3.48 

2.56 

2.31 

3.64 

3.00 

-0.47 


p[«i,« 3 |T] 

p[Ki,K4\T] 

p[k!,k 5 \T} 

p[ni,K e \T] 

p[k 2 ,k 3 \T] 

p[k2,k 4 \T\ 

p[k 2 ,k 5 \T) 

SLE 

0.19 

-0.39 

-0.28 

0.05 

-0.40 

-0.18 

-0.30 

aSLE 

-0.01 

-0.29 

-0.03 

0.10 

-0.48 

-0.17 

-0.28 

(MC)MC 

-0.02 

-0.32 

-0.03 

0.09 

-0.48 

-0.17 

-0.31 


P[k 2 ,k 6 \T] 

p[k 3 ,K4\T] 

p[k 3 ,k 5 \T] 

p[n 3 ,K 6 \T} 

p[k 4 ,k 5 \T] 

p[K 4 ,K e \T] 

p[k 5 ,k 6 \T] 

SLE 

-0.09 

-0.00 

0.22 

-0.22 

-0.20 

0.24 

-0.11 

aSLE 

-0.13 

0.11 

-0.02 

-0.32 

-0.24 

0.13 

-0.24 

(MC)MC 

-0.16 

0.10 

-0.03 

-0.34 

-0.26 

0.12 

-0.24 
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through sub-expansions of the SLE. Closed-form expressions for the first posterior moments in terms of the low- 
order spectral coefficients were given. Posterior uncertainty propagation through general quantities of interest 
was established via a postprocessing of the higher-order coefficients. The semi-analytic reformulation of Bayesian 
inference was founded on the theory and practice of metamodeling based on polynomial chaos expansions. 
This allows one to compute the SLE coefficients by solving a linear least squares problem. An analysis of 
the advantages and disadvantages of the proposed method eventually motivated a change of the reference 
density. While the expansion of the posterior in terms of the prior may require substantial modifications, its 
representation with respect to an auxiliary density many only require minor tweaks. 

The possibilities and difficulties that arise from the problem formulation were exhaustively discussed and 
numerically demonstrated. Fitting a parametric distribution to random data and identifying the thermal prop¬ 
erties of a composite material served as benchmark problems. These numerical experiments proved that spectral 
Bayesian inference works in principle and they provided insight into the mechanisms involved. The convergence 
behavior of the SLE was studied based on the leave-one-out error. It was found that high-degree SLEs are neces¬ 
sary in order to accurately represent the likelihood function and the joint posterior density, whereas lower-order 
SLEs are sufficient in order to extract the low-level quantities of interest. A change of the reference density 
allowed for reducing the order of the corrections required in order to represent the posterior with respect to the 
prior. This helped in alleviating the curse of dimensionality to some extent. 

In turn, a number of follow-up questions were given rise to. While the leave-one-out error performs well 
in quantifying the prediction errors of the SLE, it turned out to be of limited use with regard to the errors of 
the corresponding posterior surrogate and its marginals. A critical question thus relates to a means to assess 
the errors of these quantities and to diagnose their convergence. This would assist in choosing experimental 
designs of a sufficient size. Also, it would be desirable to quantify the estimation errors of individual expansion 
coefficients. This would support the assessment of the efficiency and scalability of the approach and the fair 
comparison with Monte Carlo, importance and Markov chain Monte Carlo sampling. Another question is 
whether a constrained optimization problem can be formulated that naturally respects all prior restrictions. This 
would remedy the potential problem of illegitimate values of the posterior moments. In order to handle a broader 
spectrum of statistical problems, SLEs would have to be extended to dependent prior distributions and noisy 
likelihood functions. For increasing the computational efficiency beyond the change of the reference density, it 
is conceivable to deploy advanced techniques from metamodeling and machine learning. This includes piecewise 
polynomial models, expansions in a favorable basis and the use of sparsity-promoting regression techniques. 
Yet another important issue concerns the practical applicability of the presented framework to problems with 
higher-dimensional parameter spaces. In future research efforts we will try to address the abovementioned issues 
and to answer this principal question. 


Appendix 

A Univariate polynomials 

The main properties of two classical orthogonal families of polynomials were shortly summarized in ??, i.e. the 
domain of definition, the associated weight function and the norm. Their first six members of these univariate 
Hermite polynomials {H a } a ^and Legendre polynomials {P Q } a g]N are listed in ??. Higher order members 
can be defined via recursive or differential relations. These polynomials can be used for the construction of the 
multivariate polynomial basis {T Q } ae ]N in ??. Note that this orthonormal basis is normalized via = H a /\fa\ 
or = Pjy/ i/(2a + l). 


Table 6: Low-order polynomials. 


a 

H a (x), x £ R 

P a (x), x G [-1,1] 

0 

1 

1 

1 

X 

X 

2 

x 1 — 1 

(3x 2 - l)/2 

3 

x 3 — 3a; 

(5a: 3 -3x)/2 

4 

x A — 6x 2 + 3 

(35a; 4 — 30a; 2 + 3)/8 

5 

x 5 — lCte 3 + 15a; 

(63a; 5 — 70a; 3 + 15x)/8 
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B Low-order Qols 

The representation of six low-order Qols in terms of the normalized Hermite and Legendre polynomials is given 
in ?? below. Those expansions can be used in order to compute the first posterior moments, e.g. as shown in 
??????. Note that the representations in the orthonormal bases directly follow from a change of basis and the 
substitutions H a = y/a \\and P a = + l)5' a . 

Table 7: Low-order Qols. 


Qol 

Hermite expansion 

Legendre expansion 

1 

v I'o 

d'o 

X 



X 2 

%/ 2 T 2 + 

(24' 2 /\/5 + 4/ 0 )/3 

X 3 

x/eTa + 3'Ll 

(24' 3 / v / 7 + 3 ^i/v / 3)/5 

X 4 

2v / 6'k 4 + 6 v / 2^ 2 + 3^0 

(84/4/3 + 204' 2 /\/5 + 74' 0 )/35 

X 5 

2v / 30^ 5 + lOv^a + 154>i 

( 8 W 5 /VTT + 284' 3 /v / 7 + 27T 1 / v / 3)/63 
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